Indexing Structures Derived from Syntax in TREC-3: System Description

نویسندگان

Alan F. Smeaton

Ruairi O'Donnell

Fergus Kelledy

چکیده

This paper describes an approach to information retrieval based on a syntactic analysis of the document texts and user queries, and from that analysis, the construction of tree structures (TSAs) to encode and capture language ambiguities. TSAs are constructed at the clause level and thus each document can yield many TSAs and each query may be represented by several TSAs. The TSAs from documents and from queries are then matched and their degrees of overlap between individual TSAs are computed and then aggregated to yield a score for each document, which is then used in ranking the collection. This paper presents the system description when benchmarking our retrieval strategy on category B of TREC-3, i.e. on c.550 Mbytes of the Wall Street Journal newspaper texts. The implementation is based on a two-stage retrieval where a statisticallybased pre-fetch retrieval retrieves the set of WSJ articles for the more computationally expensive language based processing. The results of our retrieval system in terms of precision and recall are disappointing and an analysis of why is also included. Part of this analysis includes a direct comparison between our system and some mainstream IR approaches. In addition to performing ad hoc retrieval on texts in English, we have also performed ad hoc retrieval on texts in Spanish using a weighted trigram approach, and this is outlined and performance results given in an appendix.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

UNT Medical Information Retrieval at TREC 2016

This paper provides a description of a project to design and evaluate an information retrieval system for clinical decision support track. The target document collection for retrieval consisted of 1.25 million biomedical related documents taken from the Open Access Subset of PubMed Central. The topics provided by TREC for query construction consisted of 30 patient narrative cases, each of which...

متن کامل

Query-Structure Based Web Page Indexing

Indexing is a crucial technique for dealing with the massive amount of data present on the web. In our third participation in the web track at TREC 2012, we explore the idea of building an efficient query-based indexing system over Web page collection. Our prototype explores the trends in user queries and consequently indexes texts using particular attributes available in the documents. This pa...

متن کامل

LAMDA at TREC CDS track 2015 - Clinical Decision Support Track

In TREC 2015 Clinical Decision Support Track, our goal is to retrieve the relevant medical articles for the questions about medical statement. We propose three main strategies of indexing, query expansion, and the ranking method. In the indexing stage, each medical article is indexed into 3 different fields: title, abstract, and body. Before querying, related words are appended to the query at ...

متن کامل

Ranking Web Pages Using Collective Knowledge

Indexing is a crucial technique for dealing with the massive amount of data present on the web. Indexing can be performed based on words or on phrases. Our approach aims to efficiently index web documents by employing a hybrid technique in which web documents are indexed in such a way that knowledge available in the Wikipedia and in meta-content is efficiently used. Our preliminary experiments ...

متن کامل

Optimizing Document Indexing and Search Term Weighting Based on Probabilistic Models

We describe the application of probabilistic indexing and retrieval methods to the TREC material. For document indexing, we apply a description-oriented approach which uses relevance feedback information from previous queries run on the same collection. This method is also very exible w.r.t. the underlying document representation. In our experiments, we consider single words and phrases and use...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1994

Indexing Structures Derived from Syntax in TREC-3: System Description

نویسندگان

چکیده

منابع مشابه

UNT Medical Information Retrieval at TREC 2016

Query-Structure Based Web Page Indexing

LAMDA at TREC CDS track 2015 - Clinical Decision Support Track

Ranking Web Pages Using Collective Knowledge

Optimizing Document Indexing and Search Term Weighting Based on Probabilistic Models

عنوان ژورنال:

اشتراک گذاری